Previous
Autoencoder
Contents
Table of Contents
Next
Value-Based RL

Chapter 13

Anomaly Detection

13.1. Overview

This chapter introduces anomaly detection - an unsupervised learning topic that has been gaining more and more attention. We will start the introduction with some basic knowledge about anomaly detection including its definition, classification, major concepts, and popular algorithms. Skipping the so-called rule-based methods, we will first present information about statistics-based methods to bridge possible gaps between the anomaly detection in machine learning and traditional anomaly detection practice. Then, more detailed information will be provided for machine learning-based anomaly detection. Considering anomaly detection is usually handled by modifying existing machine learning algorithms, the popular anomaly detection algorithms from supervised machine learning, unsupervised machine learning, and semisupervised machine learning will be discussed one by one. In the final, issues in the practice of anomaly detection will be summarized.

13.2. Basics of Anomaly Detection

Anomaly detection, also called novelty detection, outlier detection, forgery detection, or out-of-distribution detection in different areas, is intended to identify rare items, events, or observations that significantly differ from the majority of the data and do not conform to a well-defined notion of normal behavior. Anomaly detection has been applied to a variety of areas such as fraud detection, web hack detection, medical (disease) detection, sensor network anomaly detection, IoT bid data anomaly detection, log log log\loglog anomaly detection, and industrial hazard detection [135, 136].
The data that can be processed for anomaly detection include continuous data (e.g., time series data), text data (logs), and multi-dimensional data (e.g., images). Data anomalies can appear as point anomalies (data points different from the others), group anomalies (single points in a group appear normal while the whole group behaves differently from the others as a group), and background anomalies (data only appears different on certain backgrounds).
In a broad sense, the available methods for anomaly detection can be roughly grouped into rule-based methods, statistics-based methods, and machine learning-based methods. Among them, anomaly detection methods based on machine learning algorithms are anomaly detection in a narrow sense in the state of the art. The machine learning-based methods can be further categorized into supervised, unsupervised, and semi-supervised methods.
The key in rule-based methods is to obtain rules that are used to judge the anomaly. Such rules can be obtained automatically using algorithms or manually by experts. Then such rules are used to assess the patterns or behaviors to identify whether they are normal. The advantage of such methods is that they can accurately and conveniently identify the anomalies that follow the anomaly rule. The disadvantages include difficulties in identifying rules and troubles. For example, such rules may be incomplete and need to be updated frequently. When the database of rules is large, the comparison may also be time-consuming.
The core of the statistics-based method is to assume the data follows a certain type of probabilistic distribution and then use data to conduct coefficient estimation. The most common methods in this category are 3 σ 3 σ 3sigma3 \sigma3σ, Boxplot, Grubbs, and Z-score. Such methods are more applicable to low-dimensional data and have good robustness. However, they may suffer from heavy reliance on the adopted statistical assumptions.
Machine learning is the major category of anomaly detection methods [137, 138]. In unsupervised learning, common methods can be divided into five groups: statistics-based, distance-based, density-based, clustering-based, and tree-based.
In semi-supervised learning, labeled data are usually normal data points, and popular methods include one-class SVM, autoencoder, and GMM. In supervised meaning, we usually need to pay attention to data labeling and imbalanced data as possible issues, and such methods are suitable for considering data with new classes. Common methods in this category include linear models, SVM, and artificial neural networks. Table 13.2 gives a list of some common methods.
Table 13.1: Popular anomaly detection methods
Category Algorithm Criteria
3sigma x > μ + 3 σ x > μ + 3 σ x > mu+3sigmax>\mu+3 \sigmax>μ+3σ or x < μ 3 σ x < μ 3 σ x < mu-3sigmax<\mu-3 \sigmax<μ3σ
Distribution Z-score Z-score > 3 > 3 > 3>3>3
Boxplot x > q 3 + 1.5 I Q R x > q 3 + 1.5 I Q R x > q3+1.5 IQRx>q 3+1.5 I Q Rx>q3+1.5IQR or x < q 1 1.5 I Q R x < q 1 1.5 I Q R x < q1-1.5 IQRx<q 1-1.5 I Q Rx<q11.5IQR
Grubbs Testing Z-score > > >>> Grubbs limit
Distance KNN KNN distance > > >>> threshold
Density LOF LOF > > >>> threshold
COF COF > > >>> threshold
SOS Anomaly Probability > > >>> threshold
Clustering DBSCAN Cannot be clustered (label =-1)
Tree iForest Anomaly Score > > >>> threshold
Dimension-Reduction-Based PCA Dimension bias > > >>> threshold
Auto Encoder Error > > >>> threshold
Classification One-Class SVM Points out of the hyperplane (label=-1)
Prediction Moving Average, RIMA Errors + distribution-based methods
Category Algorithm Criteria 3sigma x > mu+3sigma or x < mu-3sigma Distribution Z-score Z-score > 3 Boxplot x > q3+1.5 IQR or x < q1-1.5 IQR Grubbs Testing Z-score > Grubbs limit Distance KNN KNN distance > threshold Density LOF LOF > threshold COF COF > threshold SOS Anomaly Probability > threshold Clustering DBSCAN Cannot be clustered (label =-1) Tree iForest Anomaly Score > threshold Dimension-Reduction-Based PCA Dimension bias > threshold Auto Encoder Error > threshold Classification One-Class SVM Points out of the hyperplane (label=-1) Prediction Moving Average, RIMA Errors + distribution-based methods| Category | Algorithm | Criteria | | :--- | :--- | :--- | | | 3sigma | $x>\mu+3 \sigma$ or $x<\mu-3 \sigma$ | | Distribution | Z-score | Z-score $>3$ | | | Boxplot | $x>q 3+1.5 I Q R$ or $x<q 1-1.5 I Q R$ | | | Grubbs Testing | Z-score $>$ Grubbs limit | | Distance | KNN | KNN distance $>$ threshold | | Density | LOF | LOF $>$ threshold | | | COF | COF $>$ threshold | | | SOS | Anomaly Probability $>$ threshold | | Clustering | DBSCAN | Cannot be clustered (label =-1) | | Tree | iForest | Anomaly Score $>$ threshold | | Dimension-Reduction-Based | PCA | Dimension bias $>$ threshold | | | Auto Encoder | Error $>$ threshold | | Classification | One-Class SVM | Points out of the hyperplane (label=-1) | | Prediction | Moving Average, RIMA | Errors + distribution-based methods |

13.3. Statistics-Based Methods

13.3.1. 3 Sigma

3 Sigma (or 3 σ 3 σ 3sigma3 \sigma3σ ) rule is a simple yet widely adopted method for anomaly detection. This method is established based on the assumption that the data follows a normal distribution and only contains random errors. As shown in Fig. 13.1, the standard deviation of this normal distribution, σ σ sigma\sigmaσ, quantifies how the data points cluster around the mean. This rule determines that, when a data point is far from the mean, e.g., 3 σ 3 σ 3sigma3 \sigma3σ, this point is an anomaly.
Figure 13.1: 3 Sigma: 99.7 % 99.7 % 99.7%99.7 \%99.7% of the data are within 3 standard deviations of the mean
According to the characteristics of normal distributions, we can obtain the following probabilities.
  • The probability that a data point appears in the range of μ σ μ σ mu-sigma\mu-\sigmaμσ and μ + σ μ + σ mu+sigma\mu+\sigmaμ+σ is 0.6827 .
  • The probability that a data point appears in the range of μ 2 σ μ 2 σ mu-2sigma\mu-2 \sigmaμ2σ and μ + 2 σ μ + 2 σ mu+2sigma\mu+2 \sigmaμ+2σ is 0.9545 .
  • The probability that a data point appears in the range of μ 3 σ μ 3 σ mu-3sigma\mu-3 \sigmaμ3σ and μ + 3 σ μ + 3 σ mu+3sigma\mu+3 \sigmaμ+3σ is 0.9973 .

 

 

 

 

 

 

Enjoy and Build the AI World

Sample Code from AI Engineering

Cite the code in your publications

Linear Models